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Abstract 

o 

^vq Recent advances in associative memory design through strutured pattern sets and graph-based 

^ inference algorithms have allowed the reliable learning and retrieval of an exponential number of 

patterns. Both these and classical associative memories, however, have assumed internally noiseless 
computational nodes. This paper considers the setting when internal computations are also noisy. 
Even if all components are noisy, the final error probability in recall can often be made exceedingly 
small, as we characterize. There is a threshold phenomenon. We also show how to optimize inference 
algorithm parameters when knowing statistical properties of internal noise. 

I. Introduction 

Associative memories are a kind of neural network that have seen great promise for 
c/2 their ability to learn patterns from presented inputs, store a large number of patterns, and 

lHj retrieve them reliably in the face of noisy queries |TJ-[|3j. In particular, associative memories 

are designed to memorize a set of given patterns, so that later, corrupted versions of the 
memorized patterns may be presented and the correct memorized pattern retrieved. 

Although information storage and retrieval systems are just communication systems from 
the present to the future and seemingly fall naturally into the information-theoretic framework 
where an exponential number of messages can be communicated reliably using a linear 
7-h number of symbols [HJ, classical associative memories could only store a linear number of 

patterns with a linear number of symbols [|2). 

A primary shortcoming of classical associative memories had been their requirement of 
memorizing a set of randomly chosen patterns. By enforcing structure and redundancy in the 
possible set of memorizable patterns — much like natural stimuli [|5J and like codewords in 
error-control codes — new advances in associative memory design have allowed an exponential 
number of patterns with a linear number of symbols [[6j, [|7), just like in communication 
systems. 

Since people have strong abilities in learning, storing, and reliably retrieving patterns [(8), 
one might wonder if the human brain is operating close to information-theoretic limits and 
whether it uses associative memory. Indeed, both information-theoretic and associative mem- 
ory models of storage have been used in neuroscience to predict experimentally measurable 
properties of synapses in the mammalian brain |9j, pT5| . 



X 



Contrary to the fact noise is present in computational operations of the brain [11], however, 
both classical and modern associative memory models assume no internal noise in the 
computational nodes [[TJ, [|7). The purpose of the present paper is to include internal noise 
into models of associative memories and study whether they are still able to operate reliably. 

We revisit a multi-level, graph code-based, associative memory model [7] and find that 
even if all components are noisy, the final error probability in recall can be made exceedingly 




(a) Bipartite graph G. (b) Contraction graph G. 

Fig. 1: The proposed neural associative memory with overlapping clusters. 



small, as we characterize. There is a threshold phenomenon. We also show how to optimize 
algorithm parameters when knowing statistical properties of internal noise. 

Reliably storing information in memory systems constructed completely from unreliable 
components is a classical problem in fault-tolerant computing p2|-[ 14|, where achievability 



schemes have essentially used random access memory architectures with sequential correcting 
networks. Although direct comparison is difficult since notions of circuit complexity are 
slightly different, our work also demonstrates that associative memory architectures can store 
information reliably despite being constructed from unreliable components. 

II. Problem Setting and Notation 

In our model, a neural associative memory is represented by a weighted bipartite graph 
G with n pattern nodes, x\, x 2 , ■ ■ ■ , x n , and m constraint nodes, yi, y 2 , . . . , y m . Graph G is 
composed of L clusters G^\ G^ 2 \ . . . , G^ L \ each of which is again a bipartite graph. More 
specifically, cluster G^ consists of n» pattern nodes x\ , x 2 , ■ ■ ■ , Xn}, and m; constraint 
nodes y[ l \y 2 \ ■ ■ ■ ,Vmi- The edge weight matrix W of the graph G is chosen such that 

W ■ x = 0, for all x eX, (1) 

where X is the database of C patterns x of length n. The matrix W can, for instance, be 
determined by applying a learning rule to X, cf. [7]. Equation ([T]) can be written equivalently 
as -x® = 0, where x^ = (x\ , x 2 / , . . . , xhl ) denotes the ith subpattern and denotes 
the weight matrix of cluster G^\ Note that due to overlaps, a pattern node can be a member 



of multiple subpatterns, as shown in Figure la This property, together with the constraints 



imposed by ([T|), helps in recalling the memorized patterns X even in the presence of noise. 

We assume that pattern elements are non-negative integers as they simply represent the 
firing rates of neurons. 

The goal of each cluster is to be able to correct one input error. To this end, an iterative 
decoding procedure is performed. In contrast to message-passing decoding of LDPC codes, 
messages on outgoing links of a pattern/constraint node are all the same: the same message 
is broadcast to all neighbors since neurons do not distinguish between their neighbors. 

With slight abuse of notation, let us denote the messages transmitted by pattern node i 
and constraint node j at round t with Xi(t) and yj(t), respectively. In round 0, the pattern 
nodes are initialized by a pattern x sampled from the dataset X, plus a noise vector z, i.e., 
x(0) = x + z. We further define x^(0) = x^ + z™\ where z^ is the realization of noise 
on subpattern x^\ In this work, we restrict Zi £ { — 1,0, 1}. 

In round t, the pattern and constraint neurons update their states based on feedback from 
neighbors. However, neuronal computation is faulty and so neuron decisions are not always 
reliable. The decision making criteria for pattern node i in cluster I is 

Xi (t+1 '- UfW, o.w., (2) 



where ip is the update threshold and is given by 

aM(()= (3) 

Here, dj is the degree of pattern node 2, y^(t) = \yf\t), . . . , j/m^t)] is the vector of messages 
transmitted by the constraint neurons in cluster £ and Uj is the random noise affecting pattern 
node i. Herein, we consider a bounded noise model such that is uniformly distributed in 
the interval [— v,v], for some v < 1. 

On the constraint side, the update rule is 

r+i, if^(t)>^ 

yf («) = /(/^(O.VO = < o, if - v < &S°(t) < ^ (4) 

[-1, O.W., 

where ip is the update threshold and h\ is defined as 

hf ) (t) = (wW-xW(t)) i + Vi , (5) 

in which x^(t) = \xf\t), . . . ,Xn)(t)] is the vector of messages transmitted by the pattern 
neurons and Vi is the random noise affecting node i. As before, we consider a bounded noise 
model for t>j's such they are uniformly distributed in the interval [—u, v\ for some v < 1. 

For our asymptotic analysis, we need to define the degree distribution associated with a 
bipartite graph from an edge perspective. To this end, we define \(z) = J2j ^j zj ~ 1 an ^ 
p(z) = YlijPjZ^ 1 wnere \j (resp., pj) equals the fraction of edges that connect to pattern 
(resp., constraint) nodes of degree j. Similarly, denote by A^ and the pattern/constraint 
degree distributions of cluster from the edge perspective. 

III. Main Results 

Building on the (noisy) update rules presented in the previous section, we use a combination 
of Alg. [T] and Alg. [2] to deal with the internal and external noise in recall; these algorithms 
are modified from jTj to account for unreliable computations. 

Alg. [T] aims at canceling the effect of internal noise and correcting a single external error 
within a cluster by a series of backward and forward iterations. The messages transmitted by 
pattern neuron j and cluster neuron i in cluster i are represented by yf 1 and xf , respectively. 
We let P c represent the average probability that a cluster can successfully correct one external 
error. 

Since clusters overlap, they can help each other in resolving external errors. This is done 
by Alg. [2} The following theorem gives a simple condition under which Alg. [2] can correct a 
linear fraction of external errors (in terms of n) with an exceedingly small error probability. 
The condition involves A and p, the degree distributions of the contracted graph G defined 
as follows. For each cluster G^ we contract its set of constraint nodes into a single node 
(see Fig. lb ). 

Theorem 1: Under the assumptions that graph G grows large and it is chosen randomly 
with degree distributions given by A and p, Alg. [2] is successful if eA(l — P c ■ p(l — z)) < z 
for z e (0,e). 

Proof: The complete proof can be found in the Appendix. In short, let a cluster receives 
an error message from its neighboring pattern nodes with probability z. Consider a given noisy 
pattern neuron that is connected to cluster Moreover, let n^(£) denote the probability 
that the cluster node with degree di sends an error message in iteration t of Alg. [2] 

This happens if either the node is connected to more than one noisy pattern neuron 
(in which case it sends out an error message with probability one), or the node does not 



Algorithm 1 Intra-Module Error Correction 



Input: Training set X, thresholds <p and ip, iteration t max 
Output: xf\ a?2 , . . . , Xn) 
i: for t = 1 — > t max do 

2: Forward iteration: Calculate the weighted input sum hf^ = J2]Li W^xf' + v i7 for 

each neuron yf' and set yf 1 = f(h^\%p). 
3: Backward iteration: Each neuron computes 

W = Eriisign(< ) )yf | 
^ E^isign(|^f|) M " 

4: Update the state of each pattern neuron j according to 

a;) = a;} ' + sign(#] ') 

only if \gj\ > <p. 

5: t4-t+l 

6: end for 



Algorithm 2 Sequential Peeling Algorithm [|7j 
Input: G,G^\G^,...,G^ L l 

Output: 3^1 7 J • • • 5 3/71 

i: while there is an unsatisfied do 
2: for £ = 1 — > L do 

3: If is unsatisfied, apply Alg. [T]to cluster G®. 

4: If i;W remained unsatisfied, revert the state of pattern neurons connected to to 

their initial state. Otherwise, keep their current states. 
5: end for 

6: end while 

7: Declare Xx,x 2 , ■ ■ ■ ,x n if all u^'s are satisfied. Otherwise, declare failure. 



receive any error message from its neighbors (in which case it sends out an error message 
with probability at most = 1 - P c ). Hence, U^(t) = 1 - P c (l - 2(t))* _1 . 

Now, let n(t) represent the average probability a cluster node sends a message declaring 
at least one of its constraint neurons is violated. Thus we get H(t) = 1 — P c • p(l — z(t)). 
Then, a given pattern neuron Xi with degree di will remain noisy in iteration t + 1 of Alg. |2]if 
it was noisy in the first place; in iteration t + 1 all of its neighbors among constraint neurons 
will send a violation message. Therefore, the probability of this node being noisy will be 

As a result, noting that z(0) = e, the average probability that a pattern neurons remains 
noisy will be 

z(t + 1) = eJ2 Ai(n(*))* = eA(l - P c p(l - z(t))) (6) 

Therefore, the decoding operation will be successful if z(t + 1) < z(t), Vt. As a result, we 
must look for the maximum e such that eA(l — P c p(l — z)) < z for z E [0, e], ■ 
Thm. [T] provides insight on the role of P c : if it is equal to 1, we get the same result as 
the noise-free case |7j. However, as P c moves towards 0, the value of z(t) grows towards e, 
which means we can not correct any input error. 
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Fig. 2: The behavior of f(z) = z — eA(l — P c p(l — z)) as a function of z and for different 
values of P c . In ah cases, e = 0.1. 

Further note that once P c < 1, z(t) will be bounded away from because X(x) is an 
increasing function of x. Hence, for 1 — P c p(l — z) > 1 — P c we have z(t + 1) > X(l — P c ). 
Fig. [2] illustrates how z — eA(l — -P c p(l — z)) behaves as a function of z for different values 
of P c . Reliable storage occurs when the expression is negative. 

A. Estimating P c 

To bound P c , consider four event probabilities for a cluster: 



tTq^ (resp. P W ) : The probability that a constraint neuron (resp. pattern neuron) in cluster 
t makes a wrong decision due to its internal noise when there is no external noise 
introduced to cluster L i.e. lU^lln = 0. 



n\ (resp. P{ ): The probability that a constraint neuron (resp. pattern neuron) in cluster 
£ makes a wrong decision due to its internal noise when one input error (external noise) 
is introduced, i.e. II z 



1. 



Notice P c l 



(t) 



The following lemma, with proof in the Appendix, shows that when update thresholds <p 
and ijj are chosen properly, the probability of making a mistake in the absence of external 
noise tends to zero. 

Lemma 2: In absence of external noise, the probability that a constraint neuron (resp. 
pattern neuron) makes a wrong decision due to its internal noise is given by 

... I n 



max ^0, 
max I 0, 



v 

v — <p 

V 



(7) 



(8) 



which will be for if) > v (resp. ip > v). 

Next, we derive an upper bound on the probability a constraint node makes a mistake in 
the presence of one external error; proof is given in the Appendix. 



Lemma 3: In presence of a single external error, the probability that a constraint neuron 
makes a wrong decision due to its internal noise is given by 



7r£ < max I 0, 



2;/ 



where i] = min^wi^o (|W#I) is the minimum absolute value of the non-zero weights in the 
neural graph and is chosen such that rj > ipyj 

Finally, we obtain an upper bound on P\ . For brevity, we leave details to the Appendix. 

(Pi 

Briefly, let us assume w.l.o.g. that the first node x\ is the one corrupted with noise +1. 

(Pi 

We start by calculating the probability that a non-corrupted pattern node x) ' makes a 

(Pi 

mistake and changes its state in round 1. Let us denote this probability by q\ ' . Now to 
calculate q x , assume node Xj has degree dj and has b common neighbors with node xf\ 
the corrupted pattern node. 

Out of these b common neighbors, b c will send ±1 messages and the others will mistakenly 
send nothing. Let Oj denote (sign(W^) T ■ y^) .. Then, we have 



+1, if \oj\ > (v + (p)dj 

max(0, if \oj\ < \v — tp\dj 



v-(<p-Oj/dj) 



if \oj — fdj\ < vdj 
27 • if \oj + <pdj\ <vdj. 



(9) 



2v 

v—(<p+Oj/dj) 



We now average the above equation over Oj, b c and b, yielding: 

# =El%E^E (e) (V2)Sf ) (2e - b c ), (10) 

6=0 b c =0 e=0 ^ ' 

where qi\2e — b c ) is given by (|9]), pb is the probability of having b common neighbors and 
is estimated by (A (1 — d^ / m^) dj ~ a [d^ / mi) b , with d^ being the average degree of pattern 
nodes in cluster I. Furthermore, pb c is the probability of having b — b c out of these b nodes 
making mistakes. Hence, p bc = ( 6 6 )(7r^) a " 6c (l - rc[ e) ) bc . 

Now we turn our attention to the probability the corrupted node x\ makes a mistake: either 
not updating or updating in the wrong direction. Recall we had assumed the external noise 
for x\ is +1, and so the wrong direction for node X\ is increasing its current value instead of 
decreasing it. Furthermore, we had assumed that out of d\ neighbors of Xi, j of them have 
made mistakes and will not send any messages to x\. Thus, the decision parameter of X\ 
will be gf' = u + (d x — j)/d\. Letting the probability of making a mistake in this situation 
be q%\ 

#=Pr{^ + n<^}, (ID 

which can be simplified to: 

+1, if \j\>(l + v -ip)d 1 

qi £ \j) = { max(0, if \j\ < (1 - v - V )d, (12) 



V 

v+ip-(d!-j)/d 



2u 



i, if \<pdi - (dx - j)\ < vd x . 



'This condition can be enforced during simulations as long as t(i is not too large, which itself is determined by the level 
of constraint neuron internal noise, v, as we must have ip > v. 
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Fig. 3: The behavior of pf' as a function of tp for different values of noise parameter, v. 
Here, we have = 0.01. 

Noting the probability of making j mistakes on the constraint side is 
we get 

?? = £(*V«l--fV'-M%0, (13) 

3=0 ^ 3 ' 



where q 2 (j) is given by ( fT2[ ) 



Putting things together, the overall probability a pattern neuron makes a mistake with one 
bit of external noise is: 

We use this equation to find the best update threshold cp. 

B. Choosing the best ip 



We use numerical methods applied to ( [14] ) to find the best ip, ensuring tight results. Loose 
bounds on allow an analytical approximation to the best ip, as given in the Appendix. 
Fig. |3l illustrates the behavior of the error probability as a function of ip for different values 

— (£) 

of and for tc{ ' = 0.01. As evident from the figure, choosing a larger ip results in smaller 
error probability. Moreover in all cases we have tp* > v. We use this choice, which also 



makes P (£) = 0. 



IV. Simulations 



To investigate the performance of the proposed algorithm in dealing with external noise, 
we have used a modified version of the learning algorithm proposed in (7J, in order to account 
for the internal noise affecting the neurons. 

We have considered a network of n = 400 pattern neurons with L = 50 clusters and on 
average 40 pattern and 20 constraint nodes in each cluster. Similar to [|7), the external noise 
is modeled by randomly choosing a pattern node with probability p e and corrupting it with 
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Fig. 4: The final pattern error probability for the a network with n = 400, L = 50, and on 
average 40 and 20 pattern and constraint nodes per cluster cf. J7). The blue curves correspond 
to the noiseless neural network. 



an additive ±1 noise. At this point, Alg. [2] is utilized to eliminate the external noise. Once 
finished, we calculate the bit error rate (BER) by counting the number of places the output 
of the algorithm is different from the correct version. 

A. Effect of internal noise on BER 

Fig. [4] illustrates the final error rate of the proposed algorithm for different values of v 
(noise parameter for pattern nodes) and v (noise parameter for constraint nodes). The dashed 
lines correspond to the simulation results and the solid lines are the theoretical upper bounds. 
As evident from the figure, we witness a threshold phenomenon, i.e. the BER is negligible 
for e < e*, and it grows as we move beyond this threshold. Furthermore, except for a few 
cases near the threshold, the simulations results are better than the theoretical upper bounds. 
The gap becomes quite large when the v is relatively large while v is close to zero. 

Another interesting trend in Fig. [4] is the fact that the internal noise sometimes helps the 
network to achieve a better performance. This phenomenon, which is very similar to what is 



known as stochastic resonance in the literature p"5j, flT6|, is indeed very similar to the one 



observed in genetic algorithms where limited amount of noise can help the network not get 
stuck in local minimums. To see why, note that Alg. [2] introduces no new errors. However, in 
each iteration, it might simply happen that the internal noise of neurons acts in our favor and 
helps clusters eliminate the external noise of their own, and those of neighbouring clusters. 
As a result, a small amount of deviations introduced by the internal noise might be enough for 
Alg. [2] to avoid places where the "noiseless" architecture inevitably gets stuck. Nevertheless, 
as noise becomes too much, the performance deteriorates (the black curves in Fig. [4]). 
These phenomena are inspected more closely in Figure [5] where e is fixed to 0.125 while v 



and v are varied. Figures ro3 and 6b illustrate the projected version of the 3D image on two 



dimensions to investigate the effect of v and v separately on the BER. As evident from the 




Fig. 5: The final bit error probability when e = 0.125 as a function of internal noise parameters 
at the pattern and constraint neurons side, denoted by v and v, respectively. 
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(a) Final BER as function of v for e = 0.125. 



(b) The effect of v on the final BER for e = 0.125 



Fig. 6: The final bit error probability for as a function of internal noise parameters at pattern 
and constraint neurons side for e = 0.125 



figures, a fair amount of internal noise at both pattern and constraint neurons could improve 



the performance. However, as shown in Figure 6b there is an optimal value for v which 
yields the lowest BER (around v = 0.25). While Figure [6a] seems to suggest that higher 
values of v lead to better performance marks, our newer results (not shown here) indicate 
that the same phenomenon exists for v as well, i.e. if we increase v beyond some threshold, 
the performance deteriorates. 

B. Effect of internal noise on decoding time 

Another interesting aspect to consider is the effect of the internal noise on the number 
of iterations to perform the error correction algorithm. Figure [7] illustrates the number of 
iterations performed by Algorithm [2] for correcting the external errors when e was fixed to 
0.125. The maximum number of iterations had been set to tO. Thus, the corresponding areas 
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Fig. 7: The effect of internal noise on the number of iterations performed by Algorithm [2] to 
correct external noise, for different values of v and v with e = 0.125. 





(a) Effect of internal noise at pattern neurons side. (b) Effect of internal noise at constraint neurons side. 

Fig. 8: The effect of internal noise on the number of iterations performed by Algorithm [2] to 
correct external noise, for different values of v and v with e = 0.125. The average iteration 
number of 40 indicate the failure of Algorithm [2| 



in the figure where the number of iterations equals 40 indicate decoding failure. 

Figures [8a] and 8b are projected version of Figure [7] on two dimensions and show the 
average number of decoding iterations as a function of v and v, respectively. 

As evident from the figures, the amount of internal noise drastically affect the speed of 
Algorithm [2] For one, internal noise in pattern neurons is necessary for Algorithm [2] to be 
successful, as shown in Figure [8bJ The same figure also shows that there is an optimal value 
for the noise at constraint neurons (around v ~ 0.25) for which the number of iterations 



is minimized. Furthermore, a close inspection of Figure 8a reveal the same phenomenon 
and around v = 0.4, the number of global iterations seems to reach a minimum. Another 
interesting behavior uncovered by Figure 8a is that for some values of v (e.g. v = 0.1) 
networks with less internal noise at the constraint side are faster in dealing with external 



Fig. 9: The fraction of unsuccessful attempts in Algorithm [2] as a function of e for different 
values of v with v — 0. 



errors. 

Finally, Figure|9] illustrates the fraction of unsuccessful attempts to correct external error 
using Algorithm [2J i.e. this is the fraction of times the algorithm has terminated unsuccess- 
fully. As obvious from the figure, in most cases, higher amounts of internal noise results in 
reducing the number of unsuccessful attempts significantly. 

C. Effect of internal noise on the performance of the neural network in absence of external 
noise 

At this point, we are interested in investigating the effects of internal noise on a particular 
network without external noise. More specifically, we assume e = but v > 0, v > 0. 
Furthermore, in contrast to what we have considered so far, the pattern neurons perform one 
update before starting algorithm [2] and ail noise is added to them with probability 
Therefore, even if there is no external noise, the pattern neurons could be corrupted due to 



internal noise parameter v. The rest of the scenario is the same as before. Figure 10 illustrates 
the effect of the internal noise as a function of v and v, the noise parameters at the pattern 
and constraint nodes, respectively. 

As evident from the figure, the higher the noise is on the pattern side, the higher PER will 
be (i.e. the performance deteriorates as a function of internal noise). This behavior is shown 
in Figure [TTa and lib for better inspections. 



Interestingly, this behavior is very similar to the effect of heat on the performance of 
wireless telegraphy operators observed in JP7| (see Fig. 2 in the paper). We see virtually the 
same trend here is well. The two phenomenon might be related possibly because external 
heat will translate into neurons with more internal noise. 



2 Note that this is slightly different from our previous model in which the additive noise will only result in a ± noise 
value if the input becomes larger than a threshold ip. 
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(a) Effect of internal noise at pattern neurons side. (b) Effect of internal noise at constraint neurons side. 

Fig. 11: The effect of the internal noise on final Pattern Error Rate (PER) as a function of 
v and v in absence of external noise. 

V. Discussion 

We have demonstrated that associative memories still work reliably even when built from 
unreliable hardware, addressing a major problem in fault-tolerant computing and arguing for 
the viability of associative memory models for the (noisy) mammalian brain. Further, we 
found a threshold phenomenon for reliable operation, which manifests the tradeoff between 
the amount of internal noise and the amount of external noise that the system can handle. 

The associative memory design we have proposed uses thresholding operations in the 
message-passing algorithm for recall; as part of our investigation, we optimized these neuron 
firing thresholds based on the statistics of the internal noise. As noted by Sarpeshkar in 
describing the properties of analog and digital computing circuits, "In a cascade of analog 
stages, noise starts to accumulate. Thus, complex systems with many stages are difficult 
to build. [In digital systems] Round-off error does not accumulate significantly for many 



computations. Thus, complex systems with many stages are easy to build" [18]. The key to 
our result is capturing this benefit of digital processing: thresholding to prevent the build up 
of errors due to internal noise. 

This paper focused on recall, however learning is the other critical stage of associative 
memory operation. Indeed, information storage in nervous systems is said to be subject 



to storage (or learning) noise, in situ noise, and retrieval (or recall) noise [10 Fig. 1]. It 
should be noted, however, there is no essential loss by combining learning noise and in situ 
noise into what we have called external noise herein, cf. [TBI Fn. 1 and Prop. 1]. Thus our 
basic qualitative result extends to the setting where the learning and stored phases are also 
performed with noisy hardware. 
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Appendix 
Proof of Lemma [2] 

To calculate the probability that a constraint node makes a mistake when there are no 
external noise, consider constraint node i whose decision parameter will be 



h\ ei = (WW-x«>) i + v i = v i 



Therefore, the probability of making a mistake will be 

4 e) = Pr{H >V} 



max I 0, - — - I . (15) 



v 

Thus, to make tt^ = we will select ijj > z/J^] So from now on, we assume 

7T (0) = (16) 

Now knowing that the constraint will not send any non-zero messages in absence of external 
noise, we focus on the pattern neurons in the same circumstance. A given pattern node x^p 
will receive a zero from all its neighbors among the constraint nodes. Therefore, its decision 
parameter will be = Uj. As a result, a mistake could happen if \Uj\ > (p. The probability 
of this event is given by 

P W = Pr{H ><p} 

= max ^0, • ( 17 ) 

Therefore, to make Pq go to zero, we must select <p > v. As our numerical analysis in 
section |aJ shows, this choice is in harmony with our goal to minmiize p[^ as well. 



Proof of Theorem Q] 



The proof is similar to the proof of Theorem 3.50 in [19]. Each cluster node receives 
an error message from its neighboring pattern nodes with probability z. Now consider a 
given noisy pattern neuron which is connected to a given cluster v^> . Let Ii^\t) be the 
probability that the cluster node with degree dg, sends an error message during iteration 
t of Algorithm |2j This event happens if 

1) the cluster node receives at least one error message from its other neighbors among 
pattern neurons along its input edges, i.e. if it is connected to more than one noisy 
pattern neuron. Then, with probability one it send an error message. 

2) the cluster node v^> does not receive any other error messages from its other neighbors. 
In this case, it will send an error message with probability at most P\ = 1 — P c . 

Therefore, 

Tl w (t) = l-P c (l-z(t)) de - 1 (18) 

As a result, if II (t) shows the average probability that a cluster node sends a message declaring 
the violation of at least one of its constraint neurons, we will have, 

n(t) = E Se {u^(t)} = J2p^-Pc(i-z(t)f e - 1 ) 

i 

= l-P c -p{l-z{t)) (19) 

Now consider a given pattern neuron Xi which is connected to di clusters. This node will 
remain noisy in iteration t + 1 of Algorithm [2] if it was noisy in the first place and in iteration 
t + 1 all of its neighbors among constraint neurons send a violation message. Therefore, the 
probability of this node being noisy will be z(0)(Jl(t)) di . As a result, noting that z(0) = e, 
the average probability that a pattern neurons remains noisy will be 

zit + 1) = eJ2 H^(t)Y = eA(II(0) = eA(l - P c p(l - z(t))) (20) 

i 

Therefore, the decoding operation will be successful if z(t + l) < z(t), Vi. As a result, we 
must look for the maximum e such that we will have eA(l — P c p(l — z)) < z for z £ [0, e]. 



3 Note that this might not be possible in all cases since, as we will see later, the minimum absolute value of network 
weights should be at least ip. Therefore, if xp is too large we might not be able to find a proper set of weights. 



Proof of Lemma [3] 

It's now time to consider the situation in which we have one external error. Without loss 
of generality, we assume it is the first pattern node, x\ , that is corrupted with noise whose 
value is +1. Now we would like to calculate the probability that a constraint node makes 
a mistake in such circumstances. Furthermore, we will only the constraint neurons that are 
connected to xf\ Because for the other constraint neurons, the situation is the same as the 
previous cases where there were no external noise. 

for a constraint neuron j that is connected to x( , the decision parameter is 

hf = (wW.{xW + zV)).+ 
= o+(W^.z^).+v j 

= Wjt+Vj 

We consider two error events: 

1) A constraint node j makes a mistake and do not send a message at all. The probability 
of this event is denoted by ty^. 

2) A constraint node j makes a mistake and send a message with the opposite sign. The 
probability of this event is denoted by ix^\ 

We first calculate the probability of 712 . Without loss of generality, assume the Wji > 
so that the probability of an error of type two is as follows (the case for Wji < is exactly 
the same): 

7T^ = Pr{Wji + Vj < —ip} 

= max I 0, — — J — I . (21) 

However, since if; > u and Wj\ > 0, then v — (ip + Wji) < and tt^ = 0. Therefore, the 
constraint neurons will never send a message that has an opposite sign to what it should 
have. All remains to do is to calculate the probability that they remain silent by mistake. 
To this end, we will have 



Vi{\wn + vA <ip} 



i/ + minO - Wji,u)\ 
max ( 0, — J . (22) 

The above equation can be simplified if we assume that the absolute value of all weights in 
the network is bigger than a constant r] > ^.Then, the above equation will simplify to 

.(1) / f n V-(V-^) 



nl 1 ' < max [0, ^— ^ j . (23) 

Putting the above equations together, we will obtain 

*M < max ( , U ~%~^ ) ■ (24) 

In case rj — ip > v, we could even manage to make this probability equal to zero. However, 
we will leave it as it is and use equation (24) to calculate P-f 1 . 



Calculating 

We start by first calculating the probability that a non-corrupted pattern node xf' makes 
a mistake, which is to change its state in round 1. Let us denote this probability by q\ . 
Now to calculate qf* assume x^p has degree dj and it has b common neighbors with xf\ 
the corrupted pattern node. 

Out of these b common neighbors, b c will send ±1 messages and the others will, mistakenly, 
send nothing. Thus, the decision making parameter of pattern node j, g - , will be bounded 



by 



(sign(WW) T • y W) bc 
9 > = dj ~ U ^ dj - Uj - 



We will denote (sign(I4^) T • y^) ■ by Oj for brevity from this point on. 
In this circumstances, a mistake happens when \gP\ > f. Thus 

q? = Fr{\gf\> ( p\deg(a j ) = d j k\M{x 1 )nM(a j )\ = a} 

= Pr{^ + U j >¥} + Pr{^ + Uj < -<f}, (25) 

3 3 

where Af(ai) represents the neighborhood of pattern node a, among constraint nodes. 
By simplifying equation (25) we will get 

'+1, if \oj\ > (v + (p)dj 

max(0, ^ Z ^)- I if \oj\ < \v — tp\dj 

v - ( ^\ if| 0i -Ki<H 

v ~K, j/dj \ if \o J + M<vd r 



2v 

We should now average the above equation over Oj, b c , b and dj. To start, suppose out 
of the b c non-zero messages the node aj receives, e of them have the same sign as the link 
they are being transmitted over. Thus, we will have Oj = e — (b c — e) = 2e — b c . Assuming 
the probability of having the same sign for each message is 1/2, the probability of having e 
equal signs out of b c elements will be ( b e c ) (l/2) 6c . Thus, we will get 

^ = E( 6c )(l/2)Sf , (2e-6 c ). (26) 

e=0 ^ ' 

Now note that the probability of having a — b c mistakes from the constraint side is given 
^ {b)( 7r ^) b ~ hc 0- ~ 7T i^) bc - mus > an d we some abuse of notations we will get 

& = E Q) - ^) b ° t ('':) (V2)Sf(2e - K). (27) 

6 C =0 ^ c ' e=0 ^ ' 

Finally, the probability that aj and X\ have b common neighbors can be approximated by 
( d A (1 — d^ /m^ d i~ b {d^ /rne) h , where d^ is the average degree of pattern nodes. Thus, and 
again with some abuse of notation, we will obtain 

# =X>I>cE ( b °) (V2)Sf ) (2e - 6 C ), (28) 

fe=0 b c =0 e=0 ^ ' 

where q(\2e — 6 C ) is given by (|9]), is the probability of having b common neighbors and 
is estimated by (A(l — d^/rn,£) dj ~ b (d^/me) b , with d^ being the average degree of pattern 
nodes in cluster £. Furthermore, p bc is the probability of having b — b c out of these b nodes 



making mistakes. Hence, p bc = ( 6 6 )(7Ti^) 6 ~ bo (l — 7Ti) bo . We will not simplify the above 
equation any further and use it as it is in our numerical analysis in order to obtain the best 
parameter (p. 

Now we will turn our attention to the probability that the corrupted node, X\, makes a 
mistake, which is either not to update at all or update its itself in the wrong direction. Recalling 
that we have assume the external noise term in x\ to be a +1 noise, the wrong direction 
would be for node x\ to increase its current value instead of decreasing it. Furthermore, we 
assume that out of d\ neighbors of X\, some j of them have made a mistake and will not 

send any messages to x\. Thus, the decision parameter of x\, will be gf 1 = u + (d\ — j)/d\. 

(£) 

Denoting the probability of making a mistake at %\ by q 2 we will get 



= Prjg^ < (p\deg(x\) = rfi&j errors in constraints} 
i ~ 
d\ 



Pr {^L_Z +M<(/ ,} ) (29 ) 



which simplifies to 



+1, if \j\ > (l + v-fp)^ 

q ( 2 e) (j) = { max(0, if \j\ <(l-v- ip)d x (30) 

if \tpd x - (di -j)\< vd x . 



v 

v+<p-{d\-j)/d\ 
2v 



Noting that the probability of making j mistakes on the constraint side is ( ^(ttI 



7i\ ) 1 3 , we will get 



Ely)^ 1 -^)*-™ (3D 



where q${j) is given by equation (l30j). 



Putting the above results together, the overall probability of making a mistake on the side 
of pattern neurons when we have one bit of external noise is given by 

*i ~ n (i) q i + n (i) q i {5Z) 
We will use this equation in order to find the best update threshold (p. 

Investigating the effect of choosing proper ip 



We now apply numerical methods to equation (32) in order to find the best <p for different 
values of noise parameter, v. The following figures show the best choice for the parameter 

cp. The update threshold on the constraint side is chosen such that ip > v. In each figure, we 

(£) 

have illustrated the final probability of making a mistake, P{ as well for comparison. 

Figure |3l illustrates the behavior of the error probability as a function of u) for different 
values of v and for tt{ ' = 0.02. 

The interesting trend here is that in all cases, ip*, the update threshold that gives the best 
result, is chosen such that it is quite large. This actually is in line with our expectation 
because a small ip will results in non-corrupted nodes to update their states more frequently. 
On the other hand, a very large ip will prevent the corrupted nodes to correct their states, 
specially if there are some mistakes made on the constraint side, i.e. ix^f 1 > 0. Therefore, 
since we have much more non-corrupted nodes to the corrupted nodes, it is best to choose 

a rather high cp but not too high. 

(£) 

Please also note that when tt{ is very high, there are no values of v for which error-free 
storage is possible. 




Fig. 12: The behavior of ' as a function of (p* for different values of noise parameter, v 
and 



